Skip to content

Invoke StopAll() in terminal exit pathways when service startup fails#68

Merged
dzbarsky merged 1 commit into
hermeticbuild:masterfrom
bmcdonnel:stop-services-on-startup-failure
Jun 2, 2026
Merged

Invoke StopAll() in terminal exit pathways when service startup fails#68
dzbarsky merged 1 commit into
hermeticbuild:masterfrom
bmcdonnel:stop-services-on-startup-failure

Conversation

@bmcdonnel

@bmcdonnel bmcdonnel commented May 26, 2026

Copy link
Copy Markdown
Contributor

Refs #67.

Summary

  • introduce mustStopAllForExit func which invokes r.StopAll()
  • call mustStopAllForExit() in terminal exit pathways only:
    • startup is canceled while services are starting.
    • a service startup error occurs before all services are healthy.
    • the main context is canceled after startup.
    • the wrapped test fails in one-shot mode.
    • a managed service exits in one-shot mode.
    • normal one-shot shutdown after the test path completes.
  • add test cases for this new behavior
    • marked cleanup_on_startup_failure_test as NOT_WINDOWS due to .sh script needed to test failure cleanup behavior

Validation

  • cd tests && bazel test --cache_test_results=no --test_output=errors //startup_failure:cleanup_on_startup_failure_test //startup_failure:service_exits_before_healthy_failure
  • cd tests && bazel test --cache_test_results=no --test_output=errors //...

Screenshot of new test case output

See cleanup service received shutdown.
Screenshot 2026-05-26 at 3 49 07 PM

@bmcdonnel bmcdonnel force-pushed the stop-services-on-startup-failure branch 4 times, most recently from 7682bff to a5d60b0 Compare May 26, 2026 22:05

@dzbarsky dzbarsky left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the motivation makes sense! can you please take a look at how this interact with ibazel reload loop? that's the only bit I'm worried about :)

Comment thread cmd/svcinit/main.go
mustStopAll()
return
}
if err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: combine with the above? like

if err != nil {
    mustStopAll()
    if errors.Is(err, context.Canceled) {
        return
    }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated! much cleaner

Comment thread cmd/svcinit/main.go Outdated
r, err := runner.New(ctx, serviceSpecs)
must(err)

mustStopAll := sync.OnceValue(func() map[string]*os.ProcessState {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mid-review note: how does this work with reload? Is there any way that services can get restarted after this is called (and then this becomes no-op in future?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to mustStopAllForExit to signal that it is intended for final shutdown pathways, not reload pathways.

TODO(zbarsky): what is the right behavior here when services are crashing in ibazel mode?

I saw this TODO and wasn't sure what the right approach would even be for reload failure cleanup. open to suggestions if we should try to address that in this PR! 🙏

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok if it's not getting called during reload this should be no worse than before!

@bmcdonnel bmcdonnel force-pushed the stop-services-on-startup-failure branch 2 times, most recently from 15c0298 to 9c28ff4 Compare May 26, 2026 22:32
@bmcdonnel bmcdonnel force-pushed the stop-services-on-startup-failure branch from 9c28ff4 to 3b55cb0 Compare May 26, 2026 22:40

cleanup_failure_test(
name = "cleanup_on_startup_failure_test",
target_compatible_with = NOT_WINDOWS,

@bmcdonnel bmcdonnel May 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marked this test as NOT_WINDOWS due to .sh script needed for testing failure cleanup behavior. but apparently, runner/pgroup_windows.go uses cmd.Process.Kill() which (i'm told) isn't really trappable anyway? So maybe this test isn't really doable on Windows ...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think that's ok, windows is very best-effort

@bmcdonnel bmcdonnel marked this pull request as ready for review May 26, 2026 22:52
@bmcdonnel bmcdonnel requested a review from dzbarsky May 27, 2026 21:35
@bmcdonnel bmcdonnel changed the title Stop services when startup fails Invoke StopAll() in terminal exit pathways when service startup fails May 28, 2026
@bmcdonnel

Copy link
Copy Markdown
Contributor Author

@dzbarsky, let me know if you need any other evidence or changes on this PR. would love to get this merged v soon! thanks so much 🙏

@dzbarsky dzbarsky merged commit 046ff3b into hermeticbuild:master Jun 2, 2026
2 checks passed
@dzbarsky

dzbarsky commented Jun 2, 2026

Copy link
Copy Markdown
Member

@dzbarsky, let me know if you need any other evidence or changes on this PR. would love to get this merged v soon! thanks so much 🙏

sorry, i could have sworn i hit the merge button when posting those comments. i'll cut a release now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants